22 research outputs found

    Automatic annotation for weakly supervised learning of detectors

    Get PDF
    PhDObject detection in images and action detection in videos are among the most widely studied computer vision problems, with applications in consumer photography, surveillance, and automatic media tagging. Typically, these standard detectors are fully supervised, that is they require a large body of training data where the locations of the objects/actions in images/videos have been manually annotated. With the emergence of digital media, and the rise of high-speed internet, raw images and video are available for little to no cost. However, the manual annotation of object and action locations remains tedious, slow, and expensive. As a result there has been a great interest in training detectors with weak supervision where only the presence or absence of object/action in image/video is needed, not the location. This thesis presents approaches for weakly supervised learning of object/action detectors with a focus on automatically annotating object and action locations in images/videos using only binary weak labels indicating the presence or absence of object/action in images/videos. First, a framework for weakly supervised learning of object detectors in images is presented. In the proposed approach, a variation of multiple instance learning (MIL) technique for automatically annotating object locations in weakly labelled data is presented which, unlike existing approaches, uses inter-class and intra-class cue fusion to obtain the initial annotation. The initial annotation is then used to start an iterative process in which standard object detectors are used to refine the location annotation. Finally, to ensure that the iterative training of detectors do not drift from the object of interest, a scheme for detecting model drift is also presented. Furthermore, unlike most other methods, our weakly supervised approach is evaluated on data without manual pose (object orientation) annotation. Second, an analysis of the initial annotation of objects, using inter-class and intra-class cues, is carried out. From the analysis, a new method based on negative mining (NegMine) is presented for the initial annotation of both object and action data. The NegMine based approach is a much simpler formulation using only inter-class measure and requires no complex combinatorial optimisation but can still meet or outperform existing approaches including the previously pre3 sented inter-intra class cue fusion approach. Furthermore, NegMine can be fused with existing approaches to boost their performance. Finally, the thesis will take a step back and look at the use of generic object detectors as prior knowledge in weakly supervised learning of object detectors. These generic object detectors are typically based on sampling saliency maps that indicate if a pixel belongs to the background or foreground. A new approach to generating saliency maps is presented that, unlike existing approaches, looks beyond the current image of interest and into images similar to the current image. We show that our generic object proposal method can be used by itself to annotate the weakly labelled object data with surprisingly high accuracy

    An Evaluation of Deep CNN Baselines for Scene-Independent Person Re-Identification

    Full text link
    In recent years, a variety of proposed methods based on deep convolutional neural networks (CNNs) have improved the state of the art for large-scale person re-identification (ReID). While a large number of optimizations and network improvements have been proposed, there has been relatively little evaluation of the influence of training data and baseline network architecture. In particular, it is usually assumed either that networks are trained on labeled data from the deployment location (scene-dependent), or else adapted with unlabeled data, both of which complicate system deployment. In this paper, we investigate the feasibility of achieving scene-independent person ReID by forming a large composite dataset for training. We present an in-depth comparison of several CNN baseline architectures for both scene-dependent and scene-independent ReID, across a range of training dataset sizes. We show that scene-independent ReID can produce leading-edge results, competitive with unsupervised domain adaption techniques. Finally, we introduce a new dataset for comparing within-camera and across-camera person ReID.Comment: To be published in 2018 15th Conference on Computer and Robot Vision (CRV

    Quantifying the Frequency and Orientation of Mitoses in Embryonic Epithelia

    Get PDF
    The miraculous birth of a new life starts by the formation of an embryo. The process by which an embryo is formed, embryogenesis, has been studied and shown to consist of three types of processes: mitosis, cell differentiation and morphogenetic movements. Scientists and medical doctors are still at a loss to explain the fundamental forces driving embryo development and the causes of birth defects remain largely unknown. Recent efforts by the Embryo Biomechanics Lab at the University of Waterloo have shown a relationship between morphogenetic movements that occur during embryo formation and the frequency and orientation of mitosis. To further study this relationship a means of automatically identifying the frequency and orientation of mitosis on time-lapse images of embryo epithelia is needed. Past efforts at identifying mitosis have been limited to the study of cell cultures and stained tissue segments. Two methods for identifying mitosis in contiguous sheets of cells are developed. The first method is based on local motion analysis and the second method is based on intensity analysis. These algorithms were tested on images of early and late stage embryos of the axolotl (Ambystoma mexicanum), a type of amphibian. The performance of the algorithms were measured using the F-Measure. The F-Measure determines the performance of the algorithm as the true mitosis detection rate penalized by the false mitosis detection rate. The motion based algorithm had performance rates of 68.2% on an early stage image set and 66.7% on a late stage image set, whereas the intensity based algorithm had a performance rates of 73.9% on early stage image set and 90.0% on late stage image set. The mitosis orientation errors for the motion based algorithm were 27.3 degrees average error with a standard deviation (std.) of 19.8 degrees for early stage set and 34.8 degrees average error with a std. of 23.5 degrees for the late stage set. For the intensity based algorithm the orientation errors were 39.8 degrees average with std. of 28.9 degrees for the early stage image set and 15.7 degrees average with std. of 18.9 degrees for the late stage image set. The intensity based algorithm had the best performance of the two algorithms presented, and the intensity based algorithm performs best on high-magnification images. Its performance is limited by mitoses in adjacent cells and by the presence of natural cell pigment variations. The algorithms presented here offer a powerful new set of tools for evaluating the role of mitoses in embryo morphogenesis

    Efficient Deep Feature Learning and Extraction via StochasticNets

    Full text link
    Deep neural networks are a powerful tool for feature learning and extraction given their ability to model high-level abstractions in highly complex data. One area worth exploring in feature learning and extraction using deep neural networks is efficient neural connectivity formation for faster feature learning and extraction. Motivated by findings of stochastic synaptic connectivity formation in the brain as well as the brain's uncanny ability to efficiently represent information, we propose the efficient learning and extraction of features via StochasticNets, where sparsely-connected deep neural networks can be formed via stochastic connectivity between neurons. To evaluate the feasibility of such a deep neural network architecture for feature learning and extraction, we train deep convolutional StochasticNets to learn abstract features using the CIFAR-10 dataset, and extract the learned features from images to perform classification on the SVHN and STL-10 datasets. Experimental results show that features learned using deep convolutional StochasticNets, with fewer neural connections than conventional deep convolutional neural networks, can allow for better or comparable classification accuracy than conventional deep neural networks: relative test error decrease of ~4.5% for classification on the STL-10 dataset and ~1% for classification on the SVHN dataset. Furthermore, it was shown that the deep features extracted using deep convolutional StochasticNets can provide comparable classification accuracy even when only 10% of the training data is used for feature learning. Finally, it was also shown that significant gains in feature extraction speed can be achieved in embedded applications using StochasticNets. As such, StochasticNets allow for faster feature learning and extraction performance while facilitate for better or comparable accuracy performances.Comment: 10 pages. arXiv admin note: substantial text overlap with arXiv:1508.0546
    corecore